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METHOD APPLIED TO LINEAR INVERSE PROBLEMS 
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Abstract. The Augmented Lagrangian Method as an approach for regularizing inverse problems 
received much attention recently, e.g. under the name Bregman iteration in imaging. This work shows 
convergence (rates) for this method when Morozov's discrepancy principle is chosen as a stopping 
rule. Moreover, error estimates for the involved sequence of subgradients are pointed out. 

The paper studies implications of these results for particular examples motivated by applications 
in imaging. These include the total variation regularization as well as l q penalties with q £ [1,2]. 
It is shown that Morozov's principle implies convergence (rates) for the iterates with respect to the 
metric of strict convergence and the P-norm, respectively. 
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1. Introduction. A classical problem in optimization is the solution of 

J (it) — > min subject to Ku = g , (1-1) 

where J : H\ — !-]RU{oo}isa convex functional and K : Hi —> H2 is a linear and 
bounded operator between Hilbert spaces Hi and H%. Solutions of problem (jl.l|) are 
called J -minimizing solutions of the equation Ku = g. 

Of particular interest are ill-posed equations, that is, when the solution of Ku = g 
does not depend continuously on the data g (as it is e.g. the case if K has non-closed 
range). This becomes distinctly delicate if the data g is not available precisely but 
only noise-affected observations g s for which we assume that we have the additional 
information 

\W -g\\ < s. 

It is a natural question to ask: "When does a solution algorithm for the optimiza- 
tion problem (jl.ip applied to perturbed data g s instead of g, constitute a regularization 



method for the ill-posed equation Ku = gV In 12[ an affirmative answer was given 
for the Augmented Lagrangian Method (ALM), which in the context of regularization 
is also known as the Bregman iteration (see [20|). The ALM was introduced simulta- 
neously by Hestenes [13] and Powell [2l[ as an iterative solution method for (jTTTJ) and 
reads as follows: 

Algorithm 1 (the ALM). Letp\ £ Hi and choose a sequence {T n } neN of positive 
parameters. For n = 1, 2, . . . compute 

u s n e argminf^ \\Ku-g 5 f + J{u) - {p s n _^Ku- g s )) and (1.2a) 
ueHx ^2 / 

V 5 n =p S n-X+Tn{g 5 -Ku 5 n ). (1.2b) 

The name Augmented Lagrangian stems from the fact that the functional 

C(u,p) = J(u)-(p,Ku~g s ) 
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is the Lagrangian for (|l.ip and the additional term ^ \\Ku — g s \\ 2 is an augmentation 
of C that fosters the fulfillment of the constraint. Hence, in the limit, the augmentation 
term is supposed to vanish and the variables p n shall tend to a Lagrange multiplier 
for the problem 

It is well known that the Karush-Kuhn- Tucker conditions are necessary and suf- 
ficient regularity conditions for the solutions of which guarantee existence of a 
saddle point of C Thus, if there exists G Hi and p' G Hi such that 

Kv) = g and K*p* G 9J(w f ) 

then, £(u t ,p) < £(u t ,p t ) < C(u,p^). It was pointed out in 0] that this coincides 
with the standard source conditions in regularization theory. 

As in [ijj], we will consider the ALM as a regularization method, that is, for 
stably computing approximations of solutions of from perturbed data g s . With 
1Z n : Hi Hi and 7£* : H 2 — > H 2 we denote the operators defined by 

7tn(g S ) '■= u 5 n and 1V n (g & ) — p 5 n , respectively. 



The paper 12] came up with a characterization of parameter choice rules Y : (0, 00) 



H 2 — >• N such that for each solution of 

T^T(& k .g k ){gk) -> u f as Hs-Sfcll =:4~>0 

in an appropriate sense. Under a standard source condition, it showed also conver- 
gence rates for a class of stopping rules Y(S, y s ) for which Y(S, y s ) — > 00, as S — > 0. We 
pursue further that study and mainly show that Morozov's discrepancy principle does 
belong to the above mentioned class. Moreover, we investigate the degenerate case of 
the discrepancy principle, that is when {Y(5, g s )} has finite accumulation points. Note 
that the complex challenge of choosing a right regularization parameter when dealing 
with stabilization methods for improperly posed problems is frequently approached 
via Morozov's rule due to its natural heuristic motivation. Namely, this rule selects a 
parameter by comparing the residual |.Kit* — g s \\ with the presumably known noise 
level S - see, e.g. [11|, Ch. 4]. 

In [l2j], the implications of general convergence analysis for the ALM were em- 
phasized for the case of quadratic functionals J (cf. Example Q]). In particular, the 
authors pointed out that in this case the ALM is equivalent to the Tikhonov-Morozov 
method (cf. 15]). Here, we will study in more detail two choices for J that are 
especially appealing for inverse problems occurring in imaging: 

i) Total-variation regularization (cf. 0, [H, [24[). Let Hi = L 2 (i7) for a bounded 
domain ft C R 2 and consider the function 

f|D«|(n) if it G BV(f2) 
I +00 else. 

Here, |Du| (f2) denotes the total- variation of the (measure-valued) distributional 
derivative of u. 

ii) Sparse regularization (cf. 0, [H, 18|). Let Hi = I 2 and 



J(„) = JE**W if (1.4) 
+00 otherwise 



with 1 < q < 2. 
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This work is organized as follows. Section 2 presents the main notions and notation, 
while Section 3 recalls several results in [12j and proposes some extensions of them. 
For instance, upper bounds for the Bregman distance between the subgradients of 
the objective functional J in (jl.ll) corresponding to the iterates and the solution, 
respectively, are obtained. Section 4 shows that the ALM together with Morozov's 
discrepancy principle lead to stable approximations for the operator equation both in 
the nondegenerate and degenerate cases. The results are applied for the total variation 
setting in Section 5, by underlying strict convergence (rates) for the primal variables. 
Section 6 summarizes the knowledge on the ALM for the sparsity regularization set- 
ting, i.e. convergence rates for the primal variables with respect to the ^ 9 -norm and 
for the subgradients of these variables with respect to Bregman distances (1 < q < 2) 
and dual norms (1 < q < 2). 

2. Basic Definitions and some Notation. 

2.1. Basic Assumptions. Throughout this paper we will assume that Hi and 
H2 are separable Hilbert spaces with inner products (•, •) and norms ||-|| (not further 
specified since the meaning is always clear from the context). We will frequently make 
use of Young's inequality, which states that for all u, v £ Hi and 7 > one has that 

IMI<^NI 2 + ||N 2 . 

We assume further that K : H 1 —> H2 is a linear and bounded operator and that 
J : Hi -4l = MU {00} is convex, lower semi-continuous (l.s.c.) and proper, that is, 
the domain 

D(J) = {ueHi : J(u) < 00} 

is non-empty. In order to guarantee that J-minimizing solutions of Ku — g exist and 
that Algorithm [T] is well defined, we need to impose additional restrictions (cf. (I2I 
Lem. 3.1]): 

Assumption 1 . The sub-level sets of the functional 

u 1 y \\Ku\\ 2 + J(u) 

are sequentially pre-compact with respect to the weak topology on H±. That is, for 
every ceR, every sequence {a n } netj contained in the sub-level set 

A(c) = {uEH 1 : \\Ku\\ 2 + J{u) < cj 

has a weakly convergent subsequence in H±. 

Moreover, we will assume that {T„} neN in Algorithm [T] is a fixed sequence of posi- 
tive regularization parameters which can be considered as step-sizes for the iterations. 
We will make use of the quantity 

n 

tn ■= T k . 

fc=l 

The case of constant parameter r„ = r is known as stationary augmented Lagrangian 
method and leads to t n — nr. We will only require that 

lim t n = +00 and supr^ =: f < 00, (2-1) 
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i.e., the t„'s do not decay too quickly and stay bounded. 

Finally, we will assume that g €E H 2 is an attainable element, that is, there exists 
a it £ D( J) such that Ku = g. By g s G H2 we always denote a perturbed version of 
g satisfying \\g s — g\\ < 8. For k e N, we will abbreviate gu ■— g 5k with 8k —> as 
k — ¥ 00. 

2.2. Convex Analysis. In the course of this paper we will frequently use some 
tools from convex analysis. A standard reference in this respect is [lOj. 

The subdifferential (or generalized derivative) dJ(u) of J at u is the set of all 
elements £ € Hi satisfying 

J(v) - J(u) -{£,v-u)> 0. 

The domain D(dJ) of the subgradient consists of all u £ Hi for which dJ(u) ^ 0. 
Finally, we define the graph of d J as 

Gr(9J) := {(u, € Hi X Hi : f € 9 J(«)} . 

According to [HI Chap. I Cor. 5.1], the set Gr(<9J) is sequentially closed with respect 
to the weak-strong topology on Hi x Hi. That is, if the sequence {{u n ,v n )} neN of 
elements in Gr(9J) satisfies that u n converges weakly to u and v n converges strongly 
to v, then (u,v) € Gi(dJ). 

The functional J* : Hi — > M denotes the Legendre-Fenchel transform (or the dual 
functional) of J, which is defined by 

J*(v) := sup ((v,u) — J(u)). 
ueH t 

Since J* is the pointwise supremum of affine functions it is convex, l.s.c. and 
proper [lCj, Chap. I, Prop. 3.1]. Moreover, one has [lCj, Chap. I, Cor. 5.2.] 

vedJ(u)^uedJ*{v). 

Furthermore, it follows from the definition of the subgradient that 

u G dJ*{K*p) Kue d{J* o K*){p). 

For u G D(dJ) and v € D(J), the Bregman distance of J between u and u with 
respect to £ € dJ(u) is defined by 

D$(«,u) = ■/(«)- J(u) -<£,«-«). 

We will skip the superscript £, if the choice of the subgradient is obvious. If addition- 
ally v <E D(dJ) and ry € dJ(v), we further define the symmetric Bregman distance, 

by 

Z3j ym (w,u) = Dj(v,u) + Dj(u,v) = (n - £,v - u) . 

Note that the convexity of J implies that Dj and Z3j ym are always non-negative. 

Example 1. Let H be a Hilbert space and L : D(L) C Hi — s- H be a linear and 
closed operator with dense domain D(L). Then, the quadratic functional 

\\\\Lu\\ 2 tfueD(L) 



-00 e. 



Ise. 
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is convex, lower semi- continuous and proper. Moreover, for u € D(dJ) — D(L*L) 

the subgradient dJ(u) coincides with the set {L* Lu} (cf. Lem. 2.4])- This finally 
implies that 

Df m (v,u) = \\L(v-u)f. 



2.3. Source Condition. It is well known, that regularization methods for the 
reconstruction of a solution of (11.11) in general converge arbitrarily slow, unless 
further regularity is imposed on [ll[. In the general setup presented in this paper, 
this is usually done in terms of the standard source condition that is, there exists 
an element p^ € Hi (the source element) such that 

«V e <9J(u f ). (2.2) 

3. Summary and extensions of previous results. In this section we sum- 
marize the results on regularization by means of the ALM as presented in 12J. We 
further derive an extended error estimate that allows for convergence rates of the 
sequence K*p s n in the Bregman-distance associated with the Fenchel conjugate J*. 

The dual characterization of the ALM by the proximal point method plays a 
central role in the convergence analysis in [12J. This observation dates back to the 
work of Rockafellar in (23|. In the current context, defining G : Hi x H i — > R by 

G(p,g) = J*(K*p)-{p,g), (3.1) 

it holds (cf. [H Prop. 4.2]) 

P S n = argmin ( - \\p - p 5 n _ x \\ 2 + r n G{p; g s ) J . (3.2) 

p£H 2 \<t / 



The basis of the results in [12[is the following estimate on the iterates in (|3.2p which 
was established by Giiler in [la, Lem. 2.2]: 

PROPOSITION 3.1. For all n eN and all p e H 2 one has 

II <5 1 1 ^ II (511^ 11(5 <5 11^ 

nl 6 S\ r<l S\ \\P ~ P°W \\P~Pn\\ tn \\Pn ~ Pn-l\\ ,„ „n 

G(p n ,g ) ~ G{p,g ) < " " - — r-2 — • (3.3) 

This result leads to the general convergence result [ID, Thm. 5.3]: 
Theorem 3.2. Assume that the stopping rule T : (0, 00) x Hi — ¥ N satisfies 

Jim <*fe*r(i fc ,<, fc ) = and lim t r (5 h ,g h ) = +00. (3.4) 

k— >oo k— Yoo 

Then, the sequence {lZ-v(& k .g k ){9k)} * s bounded and each weak cluster point is 
a J -minimizing solution of Ku = g. Additionally, with = K*TZy, Sk gk \{9k) G 
dJ(Rr(6 k ,g k )(gk)) it holds 



I U,IILL 11111 _L 

k— >OC 

and the residuum satisfies the rate 

. (n. \ _ nil — 



lim J(n r{5k , gk ){g k )) = J(u f ) and lim D^ k (u^ ,TZ r{Skgk) (g k )) = 0, (3.5) 



KK r(5k , gk) (g k ) - g\\ = 0(t r ^X (3.6) 
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As indicated in Section |2~31 the speed of convergence in (|3.5[) can be arbitrarily slow, 
unless one imposes regularity restrictions on the true solutions of Ku — g. We recall 
below Theorem 6.3 from [12j ] in this respect. 

Theorem 3.3. Assume that the stopping rule T : (0, oo) x H% — > N satisfies 
linifc^oo tr(«5 fc ,g fe ) — +oo. Then the following two conditions are equivalent: 

(i) There exists a J -minimizing solution u^ of Ku — g that satisfies the source 
condition (|2.2p with source element p^ 6 H 2 and there exists C £ M. such that 



S k t r(Sk , gk) < C. (3.7) 

(ii) For k — > oo, one has 

\\KK nSh , gk) (g k ) - g\\ = 0(t-{ Sh gk) ) and ||^, Sfc )(<7*)|| = 0(1). (3.8) 
Additionally, if (i) or (ii) holds, then 

< P \^r(s k , ak) (g k W)^0(t-l Skg ^ 

and each cluster point of j^p^ g k )^ k ^\ * s a m * n * m * zer °f G(-,g). 

Thcorcm l3.4l and Corollary |3 . Gl below provide quantitative estimates for the primal 
and dual iterates of the ALM in case that the source condition (|2.2p holds. These 
results extend [H, Thm. 6.2]. 

Theorem 3.4. Assume that is a J -minimizing solution of Ku = g which 
satisfies the source condition (|2.2[) with source element p* £ H^. Then, for any 7 > 

Di(K-ptK^) + £ \\Kui - gf + 1^1 |K - P f < M_^T + (1±JK S ,_ 

(3.9) 

Proof. Since u' satisfies the source condition, we have that K*p' € dJ(u') which 
is equivalent to € dJ*(K*p^). This leads to 

G(p 5 n ,g 5 ) - G( P \g 6 ) = G(p s n ,g) - G{p\g) + {p s n - P \g-g k ) 

= J*{K*p 5 n ) - J*(KV) - {p 5 n -p\g) + {p 5 n ~p\g - g s ) 

= J*(K*p 5 n ) - J*{K*pt) - (K* P 5 n - K*p\v)) + (p s n -pKg - g s ) 

= Df, (K*pt i^V) + (p s n -p\g- g s ) . 

Therefore, the last inequality together with Proposition 13.11 and Young's inequality 
gives for an arbitrary 7 > 

Df (K*pt K*pt) = G(pl g s ) - G(p\g 5 ) + (p s n - p\g s g) 

< W-*th_zl\ ¥-pi\\ 2 _^\pi-pU\ 2 + 1§Hn (3 , 0) 

2t n \ 7 J 2t n 2r,2 2 n y ' 



Using (|1.2bj) together with the inequality \\Ku s n - g\\ 2 < 2 \\Ku 5 n - g s \\ 2 + 2S 2 and 
the previous estimate show the assertion. □ 
Lemma 3.5. Let a, b > 0. Then, 

inf ( -^—a + -^-b] = (Vb + Va~+b 

7>1 \ J — 1 7 — 1 
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Proof. With elementary calculus it is straightforward to deduce that the function 
fil) — (7 — l) _1 (7 a + l 2 b) attains its minimum among all 7 > 1 at 



Then, 7*/(7* — 1) = (Vb + Va + b)/s/a + b and hence 

/( 7 *) = _Z2_( a + 7 ^) = -1— (a + b+VWa + b) = (Vb + Va + bX 
7* — 1 7* — IV /V / 

□ 

Corollary 3.6. Let the assumptions of Theorem \3.4\ hold, 
i) IfO<a< 1/2, then 

aDf m (ui^)+Di(K*pi,K^) < i_£<5^ + M_^L. 

ii) It holds 



Df m (u s n ,ui) < \\Ku s n -g\\ (5t n + ^5H2 + \\p s - P q 2 ). 
Proof. From Young's inequality it follows that 

aDTiutut) =a(p s n -p\Ku s n - g) < £ -p+f + | \\Ku s n - g\ 



Hence the first inequality follows from Theorem 13.41 with 7 = 1/(1 — 2a), due to the 
fact that a < 1/2. 

In order to prove ii) we observe from (|3.10[) that for all 7 > 1 

\\p S n- P t<^H-pt + ^ 2 C 

II II ry _ 2 II II 7 — 1 

Hence, Lemma 13.51 with a = \\ P q — p^\\ and b — 5 2 t^ leads to 

■/ - iJW 2 < ( fit.. + Jf?t? 4- WrA -«t|| 2 



UK-Pi < [ttn + \l5Hl+\\pl-p 

Finally, the assertion follows from 

£>J m «,ut) = (p s n -p\K{u s n ut)) < \\Ku s n g\\ \\p 5 n -p^\\ . 

□ 

Remark 3.7. 

i) Obviously, the best possible rates with respect to the estimates in Theorem 
and Corollary |3.6l i) are obtained when ir^g 5 ) ~ 5 ■ However, if one only has 

<5ir(<5, 9 <5) < C, 

for some C > 0, then Corollarv l3.6l ii) shows that the symmetric Bregman distance 
behaves at least as well as the residual: 



DT{ulv!) = 0(\\Ku s n -g\\) 
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ii) Since K*p s n € dJ(u s n ) and K*p* G dJ(v!) is equivalent to u s n G dJ*(K*p s n ) and 
G dJ*(p^) respectively, it follows that 

Dj^(K*p s n ,K*p^ = Df, (K*p k , K'pt) + D u ji (K*p\K* Pk ) 
= {u k -v),K*p k -K*pt) 

Hence, all estimates for the primal variables { u fi} neN automatically hold also for 

4. Morozov's discrepancy principle. In this section we analyze the discrep- 
ancy principle as an a posteriori stopping rule. In order to apply the convergence 
(rate) results in Theorems 13. 2113.31 and 13.41 a given stopping rule T : (0, oo) x H 2 — > N 
has to satisfy (|3.4[1 and (|3. 71) , respectively. We verify these estimates for the particu- 
lar situation where the stopping index is chosen according to Morozov's discrepancy 
principle: Choose p > 1 and define 

r(*,/):=min{neN : \\Ku 5 n - g s \\ < p5} . (4.1) 

That is, we take the first iterate u s n for which the residual -Ku^ — g s \\ falls below a 
number which is a constant p times the noise level 5. 

Proposition 4.1. The stopping rule (|4.1j) is well defined. 



Proof. It follows from [12J, Cor. 5.2] that there exists a constant C > such that 



In , ,,,2 C S 2 

- \\Kui-g 5 \\ < 1 

2 11 n y 11 - t n 2 



This implies that for all p > 1 there exists an index no G N for which ||-Ku„ — g < 
pS. Thus, T(S,g 5 ) < oo is ensured. □ 

Our analysis is structured as follows: In Section [4.1l we derive convergence rates 
(based on Corollary 13.61 ii)) for the symmetric Bregman-distance between the primal 
iterates {w* } ngN and J-minimizing solutions of Ku — g, under the hypothesis that the 
source condition holds. Here, we make no other assumption on T(S, g s ) except (|4.ip . 



In Section I4T21 we then point out that the convergence results in Theorems 13 . 2 1 and 
apply for the parameter choice rule (|4.H if additionally one requires lim^o T(5, g s ) = 
00. We refer to this situation as the non-degenerate case. Finally in Section 14.31 we 
treat the degenerate case, i.e., where {T(5, g s )}s has finite accumulation points. 

4.1. Convergence rates.. We will state a qualitative estimate for the Bregman 
distance between the primal variables in the ALM and solutions of (jl.ip if the source 
condition is satisfied and if the Morozov stopping rule is applied. In particular, this 
analysis sheds some light on the role of p in (|4.1j) . 

Lemma 4.2. Let be a J-minimizing solution of Ku — g that satisfies the 
source condition with source element p* and assume that T is chosen according to the 
stopping rule (|4.ip . Then, 

5t mgS) < + Sf. (4.2) 



In particular, p.7[) is satisfied. 
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Proof. Let g s £ H 2 and set 5 := \\g — g 5 \\ as well as n + = T(S,g s ) — 1. Then, it 
follows from (|4.1[) that 



\Ku s n ,-g 5 \\>pS. 



This together with (|3.3[) yields 



< G(p,9 ) - G(p° ,g ) + 



s II 2 



for all p € F 2 (recall that ||isTu* = t" 1 ||p£-p£_i|| by (|L"2b"|) ). From the defini- 

tion of G it follows that G(p, 5 5 ) - G« , <? 5 ) = G(p, g) - G(p* . , g) + (p - p s nt , g - g s ) . 
After applying Young's inequality to the inner product we get, for every p € jffa and 
»j>0 ) 

II 5 II 2 r2j. II <5 II 2 II ill 2 II <5II 2 

P-P«, , p5H n , s \\P-Pn,\\ , '7 5-5 , P-Po 

< G(p,g)-G(p° n m ,g)+" ' 



2i„, 2 - ^'^ ™' ,J ' 277 2 2t„, 

Setting r\ — t 7U hence gives 

(p - l)S 2 t n , 5 ||p-Po|| 2 , A ^ 
2 <G(p,g)-G(p nr ,g) + ^-Y t — (4.3) 

Since satisfies the source condition with source element p\ it follows from (l2l . 
Prop. 6.1] that G(p\g) < G(p,g) for all p € i?2- Moreover, using p* instead of p in 
(|4~3f shows 

(p-l)ftn, < ||^ -Pof 



2*„» 

or in other words 



W-P 5 o\ 



□ 

With this preparation we are ready to state the announced estimate for the primal 
variables. 

Theorem 4.3. Let the assumptions of Lemma \4.2\ be satisfied. Then, 



jjsym 



(K mge) (g 5 W)<(l + 0(V5)) P M=^ ||Po-P f |K (4-4) 



JC2+2 illt <5|| 2 s 1 lit <5|l 2 i ^ T II t 5\\ 1 ^2-2 , II t S\ 

6 *r(*.s s ) + W p Poll ^-—\W p Poll + -r^j\\P Pa\\+ S T +\\P ~Po| 



Proof. From (1431) it follows that 



p 


- 1 




p 


p 


- 1 




p 



,2 J If 

IP' -Po 



t_^ll^j/^=|Ut_ p 5|| + ^2 
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This together with (14.21) and the fact that \J a + b < ^/a + \fb for all a, b > implies 

st mg s } + v^wj+n^-poir ^ 7=r ¥ - foil + 

Since by construction in (|4.1I) 

||^ W) (5 S )-5 4 || <p<5, 

the assertion follows from Corollary 13.61 ii) . □ 
Remark 4.4. The function 

which appears in the right hand side of (|4.4I) is minimal for p* ~ 1.6404 with f(p*) — 
4.6753. Hence, after setting p = p* in the stopping rule (|4.ip . Theorem 14.31 implies 
the following rough estimate 

Dj m (n W) (g 5 ),u^<5\\p 5 -^\\S 

as S — >• 0. 

4.2. The nondegenerate case. In this section we will show that the assump- 
tions of Theorems 13.21 and 13.31 are satisfied for the stopping rule (|4.1j) , if additionally 
one requires 

lim T(S k ,g k ) = oo. (4.5) 

k— >oo 

From Lemma 14.21 it already follows that (I3.7[) holds which implies applicability of 
Theorem 13.31 Moreover, we find 

Lemma 4.5. Assume that T is chosen according to the stopping rule (|4.1[) and that 
(|4.5p holds. Then, T(5 k ,g k ) satisfies (|3.4[) . i.e. S k Hrrs k ,g k ) ~^ and tr(s k ,g k ) ~ * +°o, 
as k — > oo. 

Proof. Let e > and choose p £ £ if 2 such that G(p e , g) < inf ge _f/ 2 G(q, g)+e (note 
that, due to [TH, Lem. 4.1], the right hand side is finite whenever g is attainable). 
This together with the estimate (|4.3[) in the proof of Theorem 14.31 shows 

(p - l)S 2 t n , < g | ||p B -pgf 



According to (|2.ip . the conditions r k < f for all fc e N, and linifc_>. 00 F(<5fc, <?*,) = 00 
imply limfc^oo ir(<5 fc , 3fc ) = +00. Hence, substituting g k for g 5 , S k for 5, and F(4,5fe)-1 
for n* shows 

.2, ( 2e Ibe-Pof r2 -\ 2e 

hmsup^tr^^) < hmsup + f — + S k r = -. 

fe^oo fc^oo Zt r(<5fc, fffc )-1 / 

Since e is arbitrary, this proves the statement. □ 

Combining the above results with Theorem 13.21 yields results on convergence for 
Morozov's discrepancy principle as a stopping rule: 

Corollary 4.6. Assume that F is chosen as in (|4.1I) and that (|4.5p holds. 
Then, the sequence {^-r(i5 fc ,g fe )(9fc)} fegN is bounded and each weak cluster point IV is 
a J -minimizing solution of Ku = g. Additionally, (|3.5p and (I3.6P hold. 
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If additionally the source condition is satisfied, Lemma l4.5l and Theorem l3.3l hrrplv 
Corollary 4.7. Let the assumptions of Corollary \4-6] be satisfied and assume 
that there exists a solution of (11.11) which verifies the source condition with source 

element p^ . Then, (|3.8[) holds and each weak cluster point of ^TZ^^ Sj _ Sfc )(Sfc)| * s a 

minimizer of G{-,g). 

Remark 4.8. From Schauder's Theorem and from ran(i^) = kcr(if *) ± it follows 
that for each compact K with dense range, the adjoint operator K* is compact and 
injective and hence 

limK*n* r{5ktgk) (g k ) = K*p 

strongly, where p is a minimizer of G(-,g). If the condition on the range of K is not 
satisfied, then strong convergence hold on subsequences. 

4.3. The degenerate case. We will finally discuss the case when the stopping 
index chosen by Morozov's discrepancy principle degenerates, that is, when there 
exists an N G N such that 

limsupr^,/) = N. (4.6) 

In this case, the assumption (|4.5I) is not satisfied and the results of Section |4~21 do not 
apply in general. 

The following result shows, however, that a degenerate stopping rule as in (|4.6[) 
already implies that the true solutions of (jl.ip satisfy the source condition (|2.2[) and 
hence the results in Section 14.11 hold. Moreover, the convergence (on subsequences) 
of the dual sequence also follows. 

Theorem 4.9. Let T : (0, oo) x H 2 -t N be as in (|4.1j) and assume that (|4.6|) 
holds. Then, the following assertions are true: 
i) The set {p^} s>0 is bounded and each of its weak cluster points is a minimizer 

ofG(-,g). 

ii) The set { u %}g >0 * s bounded and each of its weak cluster points is a J -minimizing 
solution of Ku = g. 

Hi) All J -minimizing solutions of Ku = g satisfy the source condition with a source 
element p^ . 

iv) 

\\Ku 5 N - g\\ < (p + 1)6 and Df m {u s N ,u^) = 0(5). (4.7) 

Proof, The definition of T{S,g 5 ) in (|4.ip and the monotonicity of the residual 
\\Ku s n -g s \\ (cf. [H Cor. 3.3]) imply 

\\Ku s N - g s \\ < p5, for all 8 > 0. (4.8) 

In particular, this yields ifu^ — ojl < (p + 1)6. 

It was shown in the proof of 12|, Thm 5.3] (by using Giiler's estimate (|3.3[) and 
Young's inequality) that 

\\p-p s N \\ 2 <2\\p-p d Q \\ 2 +U 2 N 6 2 +4t N (G(p,g)- inf G(q,g)) forallpe^. 

Choosing an arbitrary p such that G(p, g) < implies 

limsup \\p%\\ =■ A < oo. 

<5->0+ 
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Now, let {(^} fcgN be such that 8k — > + and that p^ — 1 p G H 2 - Due to the dual 
characterization (|3.2j) and to the equality p s N _ 1 —p% = t n(Ku 5 n — g 5 ), it follows that 
Ku^ - g 5 * G dG{-,g s *){p s *). Since G(p,<?) = G(p,<? 5 *) + (p,<? 5 * - g) for all p G H 2 , 
one has 

Recall that the graph of the subgradicnt of a convex and lower semi-continuous func- 
tional is weakly-strongly closed. Therefore, inequality (|4.8[) yields 



= lim Ku S *-g£ dG(; 9 )(w-\imp^) = dG(-,g)(p). 

k— >oo k— >oo 

This proves i). 

From the definition of u 5 N in (|1.2aj) and the fact that p s N _ x ~P% = Tjy(Ku s N — g s ) 
it follows (for 8 small enough) 

^ \\Ku 5 N -g 5 \\+ J(u%) < ^-S 2 + J( U t) + (p^g- Ku 5 N ) 

< ^ 2 + J(u r ) + A{ P + 1)8 + T NP { P + 1)S 2 . 

In other words, J{u s N ) - J(u t ) = 0(6) as 8 -> + . This together with (|478|) shows 
that sup (5> g {J(u s N ) + AltO} < oo and consequently, according to Assumption [1] 
that { u n} s>o * s wea ^ly compact and hence bounded. Thus, ii) follows from (|4.8p and 
the lower semi-continuity of J. 

Let pt be a minimizer of G(-,g), which exists according to i). This and the 
definition of G(p,g) in (|3.1j) implies 

G(p\g)~G(p s N ,g 5 )<S\\p s N \\. 



Moreover, we deduce from the optimality condition of (ll.2al) that K*p s N G dJ(u s N ), 

G d{J* o K*)(p s N ). Usi 
its give 

G(pt, g) - G(p 5 N ,g 5 )> -8 (||p+ 1| + \\p 5 N ||) 



which in turn implies that Ku s N G d(J* o A*)(p^). Using the definition of the 
subgradient and some rearrangements give 



Since { \\p% || } < 5 >0 ^ s bounded according to ii), the previous two estimates result in 
lim J*(K*p 5 N )-(p 5 N ,g s ) = lim G(p 5 Nl g s ) = G{p\g) = J* (A* p+) - (p+, g) . (4.9) 

Using once more the relation K*p s N G dJ(u s N ) shows that J*(K*p s N ) + J(u%) = 
(K*p s N ,u s N ) and consequently 

J*(K*p%) - (p 5 N ,g s ) + J(u%) = (Ku% - g s ,p%) . 

Now, let be a J-minimizing solution of Ku — g which exists according to ii). 
Taking the limit 8 — > + in the previous equality, using (|4.8I) . (I4.9[) . as well as the 
boundedness of {p s N } s>0 and the fact that J(u s N ) — » J(u t ) result in 

J(«t) + J*(A*pt) = <P f ,5) = (AW) 
that is, A*p t G dJ(v)). This proves iii). 
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Statement iv) follows from i), iii) and Corollary 13.61 ii) together with the first 
inequality in (|4.7j) . □ 

Remark 4.10. As {T(5, g s )}s>o has finite accumulation points, without restrict- 
ing generality, we can consider that this is a constant subsequence. This yields that 
for all 6 sufficiently small, one has to stop the algorithm at the same iteration. 

A degenerate case is discussed for the Landweber method for nonlinear equations 
in the book (llL p. 284]. It is shown there that \ims^ou s N — un where un is the 
A-th iterate in the exact data case and is a solution of the operator equation as well. 
This means that in the exact data case the Landweber algorithm reaches the solution 
after A steps, with A being the stopping index in the noisy data case. 

For the ALM analyzed here, we could not show that lim^o u s N = ujv where un 
is the A-th iterate in the exact data case because the implicit feature of the method 
makes the analysis more difficult. However, we could establish that the accumula- 
tion points of {u s N }s>o are J - minimizing solutions with additional smoothness, i.e., 
satisfying the source condition. 

The results for the two cases are briefly summarized in the following corollary. 

Corollary 4.11. Let F : (0, oo) x H 2 — > N be chosen according to Morozov's 
rule (|4.1I) . Then, as 6 — > 0, the stopping index T either increases and leads to weak 
convergence of the ALM algorithm on subsequences to solutions of the operator equa- 
tion or is constant, in which case the corresponding ALM iterates converge weakly on 
subsequences to a solution of the equation satisfying the source condition. 

5. Iterative total variation regularization. The ALM method in the case of 
J being the total variation seminorm (|1.3[) is also known as Bregman iteration (20j . 
It was shown in [2(| that Morozov's discrepancy principle yields weak* convergence 
in BV(f2) of the iterative method. The expected but missing convergence there was 
the one with respect to the total variation seminorm, in the sense 

lim J(u fc ) = J(u). (5.1) 

k— >QO 

As a consequence of the analysis based on the augmented Lagrangian method tools, 
it became clear that this convergence does hold. Moreover, linear convergence rates 
with respect to the Bregman distance associated with the total variation seminorm 
were established in [j| first for the noise free case. According to Q and due to the 
symmetric Bregman distance estimates pointed out in this work, such convergence 
rates provide information on the fine structure of the iterates, that is, the variation 
of the iterates is concentrated around the discontinuities set of the true solution. In 
the noisy data case, an a posteriori stopping rule was proposed in [20j ]: 

n*(6,g s ) =max{n G N : \\Ku s n -g s \\ > P S} , p>l. 

Although convergence was shown there for the net {u s n , s gS ^} as 5 — > 0, no convergence 
rate was obtained for it. This section aims to point out such a convergence rate. Note 
that the a posteriori rule (|4.1[) employed here relates to the above mentioned one by 

r(S,g s )=n*(S,g s ) + l. 

Still, the question on how to quantify the weak* convergence is not answered. 
A possible answer could be given by taking into account that weak* convergence in 
BV(f2) together with convergence in the sense (I5.1[) is equivalent to so-called strict 
convergence. Thus, one can obtain convergence rates with respect to a related metric, 
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as shown below. Recall |2|, page 125] that {u k }keN C BV(f2) converges strictly to u if 
it converges with respect to the metric 

d(u,v) = ||u-t>||n + \J(u) - J(v)\. (5.2) 

In this section we consider the linear and bounded operator K : L 2 (£7) — > L 2 (r2), 
where f2 C M. 2 is open and bounded. 

Proposition 5.1. Let {gk} keN c L 2 (£l) be such that \\g - gk\\ < &k as 
fc — > oo. Lei r be chosen according to the Morozov's rule (j4.1[) and assume that 
linife^oo r(4,fffc) = oo. T/ien, i/ie sequence {^r(5 fc , 9fc )(3fc)} fceN satisfies ([3"3]l and 
(I3.6[) . Moreover, it has a subsequence which converges strictly to a J -minimizing 
solution of Ku = g. 

Proof. The first assertions result from Corollary 14.61 Let further denote u k = 
TZr{8 k ,g k ){gk)- According to Corollary 14.61 the sequence {wfc} fcgN is bounded in L 2 (0) 
and sup fcgN J(uk) < oo. Hence we find that 

sup||u fe || BV = sup||w|| L i + J{u k ) < oo. 

fc6N keN 

Theorem 2.5 in [l| implies that {ufe} fceN is strongly L 1 -compact and thus there is a 
subsequence, indexed by k' , which converges to some u* strongly in L 1 (J7). Since each 
L 2 -weak cluster point of {itfc} fcgN is a J-minimizing solution of Ku — g according to 
Corollary |4.6l the same holds for u*. Finally, it follows from (|3.5p that d(uk>, u*) — > 0. 
□ 

Clearly, error estimates in terms of the I^-norm are desirable, but not easy to 
derive. In order to show convergence rates for strict convergence of the iterates, we 
need to employ another metric, which appears naturally in the analysis, namely 

d(u, v) = \\Ku - Kv\y + \J{u) - J{v)\ . (5.3) 

The following lemma points out the relation between the two metrics described 
above. 

Lemma 5.2. Assume that K : L 1 (fi) — > L 2 (f2) is continuous and can be extended 
by continuity to L 2 (f2). Then, convergence of a sequence with respect to the metric 
d defined by (|5.2p implies convergence of the sequence with respect to the metric d 
defined by (|5.3p . If additionally the linear bounded operator K : L 2 (f2) — > L 2 (f2) is 
infective, then the two metrics are equivalent. 

Proof. The first part follows immediately from ||ifit|| L2 < \\K\\ ||u|| L i for any 

weL 1 ^). 

Assume now that d(iik,u) — > as k — > oo and that K is injective. Then, K in 
particular does not annihilate constant functions and it follows from [l|, Lemma 4.1] 
that u i — ^ ||Xu|| L 2 + J[u) is BV-coercive. Hence boundedness of {|j_R'ufc|| L 2} /£eN and 
{J(w,fe)} fegN , which follows from d(v,k,u) — > 0, yields boundedness of {||'"/ s || B Y} feet! j. 
Thus, there exists a subsequence {ufc'} fc , gN which converges to some v € BV(il) 
strongly in L 1 (f2)and weakly in L 2 (fi) to v due to compact and bounded embedding 
respectively (cf. [lj, Theorem 2.5]). These yield strong convergence of the subsequence 
in L 1 (S7) to v, as well as weak convergence in L 2 (fl) of {Kuk>} k , to Kv. Since the 
weak limit is unique, it follows that Ku = Kv and consequently, since K is injective, 
that u = v. 

Moreover, the entire sequence {w.fe} fegN converges strongly in L 1 (f2) to u, which 
completes the proof. □ Note that the continuity of the operator K from L 1 (f2) into 
L 2 (f2) is not necessary for proving the second part of the lemma. 
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Now we are able to show the convergence rate in terms of the metric d: 
Proposition 5.3. Let {g k } keN C L 2 (fl) be such that \\g - g k \\ < 5 k ^ as k -> 
oo. Let r be chosen according to rule (|4.1[) and assume that lim^oo T(S kl gk) = °o- 
If V, is a J -minimizing solution of Ku = g that satisfies the source condition (|2.2p 
with source element G H2, then the following convergence rate holds: 

d{n nSk>gh) {g k ),u)) = \\KTZ T(Sk>gk) (g k ) - Ku^\\ + \J(n r{Sk . gk) (g k )) - J( u t)| = 0(S k ). 

Proof. From the definition of rule (|4.1[) it follows that \\KlZr(s k 9fe )3fc — KvJ\\ = 
0(6 k ). 

In order to establish an error estimate for \ J (T^-r (8 k ,g k )(gk)) — J(u')\, we use The- 
orem 231 Indeed, since the symmetric Bregman distance is larger than the Bregman 
distance, one has 

J(ut)- J(n r{Sk<gh) (g k )) < 

( n *r(s k ,g k )(9k),g - KTZ nSk>gk) (g k )) + Dj m {n r(Skygk) (g k ), 
Using the Cauchy-Schwarz inequality and again Corollary 13 . 61 we see that 

J(ui)-J(K nSk , gk) (g k )) = 0(5 k ). 

Similarly one can show 

J{n v(Sk ^ k) {g k ))~J^)=0{5 k ) 
which ends the proof. □ 

6. Sparse regularization. In the case of sparse regularization, the convex func- 
tional (|1.4|) is considered with 1 < q < 2 (see [9(). The aim of the functional J is 
to promote sparse solutions, i.e. solutions which have only a few (especially a finite 
number of) nonzero entries. Tikhonov regularization based on this regularization 
functional has been studied in great detail in 13, IH [ijj]. The case q = 1 for the sta- 



tionary augmented Lagrangian method has been treated in [5j also under the name 
Bregman iteration. There, the authors obtained convergence of the method for noise- 
free data for the Bregman distance and considered an a priori stopping rule for noisy 
data. In this section we also treat the case q = 1 and derive both an enhanced con- 
vergence rate for noisefree data in norm and also optimal convergence rates for noisy 
data with the a posteriori rule given by Morozov's discrepancy principle. 

6.1. Convergence rates for <5 — !• 0. We start with a result on convergence in 
the noisy data case which holds for all q £ [1,2]. Fulfillment of a source condition is 
not needed here. 

Theorem 6.1. Let K : H 2 — > H2 be linear and bounded, 1 < q < 2 and let J be 
defined by (|1.4[) . Moreover, let the parameter choice T obey ()3.4j) . Then the sequence 

(<5fc,9fc)(ff fc )} has a subsequence which converges strongly to a J -minimizing solution 
of Ku = g. 

Proof. By Theorem 13.21 the sequence {1Z T (s k ,g k )(gk)} is bounded in £ 2 and 
hence, has a subsequence which converges weakly in £ 2 . Moreover, it follows from 
Theorem 13.21 that J(Tlr(S k ,g k )(gk)) — > J(vJ). By 0, Lemma 4.3] this shows that 
J{T^-r(s k ,g k )(gk)) also converges strongly. □ Note that the entire sequence of iter- 
ates converges strongly to the unique J-minimizing solution of Ku = g in the case 

<ze(i,2]. 
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By Theorem 14.51 we also conclude that ^ 9 -regularization combined with Moro- 
zov's discrepancy principle gives rise to a (sub-sequentially) convergent regularization 
method and, if additionally the source condition is fulfilled, leads to convergence rates 
in the sense of Bregman distances. 

Actually, in the latter case, we can strengthen the above result. More precisely, 
we can derive convergence rates with respect to the £ q norm for q € [1,2]. The two 
cases q G (1,2] and q — 1 have to be treated separately. 

In the case q G (1,2], we take advantage of the differentiability and the high 
degree of convexity of the functional J to estimate even the distance between the 
subgradients appearing in the iterative process. 

The Fenchel conjugate of J is J*(£) = ~ M\\ r £r with r = q/(q — 1) > 2 (see, e.g., 
[H Proposition 4.2, p. 19]. 

The following result, which will be useful in the sequel, was pointed out in 22, 
Proposition 3.2]. We give here the proof for the sake of completeness. 

LEMMA 6.2. If q G (1,2] and J according to (jl.4[) . then one has for all v G £ q 
and u G i 2 ^-^, u ^ 0, 



Dj(v,u) > c q \\v - u\\ eg 



3.1) 



for \\v — u\\ £g small enough, where c q = c q (u) is a positive number. 

Proof. The inequality is obvious if q = 2. Let q G (1,2). Note that D{dJ) — 
^ 2 (?-!) i n order to simplify the notation in the proof, we omit the subscript for the 
l q norm. Now @, Lemma 1.4.8] implies that for all v G l q , u G £ 2{ - q ~^ 



Dj(v,u)>(t+\\u\\) q -\\u\\ q -qt\\u\ 



9-1 



3.2) 



where t := \\v — u||. Let ip(t) := (t + \\u\\) q for t small enough. The Taylor expansion 
of ip around yields existence of an a t G (0, t) such that 



<p(t) 



qt\\u 



3-1 



q{q-l)t 2 



19-2 



g ( g -l)(g-2)f 3 
6 



(ot + HI) 



9-3 



This inequality and (16.21) imply 

Dj(v,u) > <p(t) - \\u\\ q - qt\\u\ 



9-1 



q(q-i)t 2 

2 

q(q - l)t 2 



19-2 



19-2 



q(q- l)(g-2)t 3 



1 - 



6 

(2 - q)t 



(a t + \\u\ 



\q-3 



\\u\\ 2 - q (a t + || U ||^ 3 ) 



Note that a t 



\u\\ > \\u\\ and q - 3 < 0. Hence, (a t + ||u||) 9-3 < ||ti|| 9 " 3 and 

r (2-g)t. 



n , x ^ Q(Q ~ 1)* n no-2 
Dj(v,u) > u 9 



1 



Let b G (0, 1) and take t < 3(1 2 ^J NI . Then inequality (JOJ) yields 



Dj(v,u) > c q t 2 , 



with 



_ bq(q-l) 



19-2 



(6.3) 



^9^2 

Proposition 6.3. Let K : I 2 -> 7J 2 &e iinear anrf bounded, J be defined by (| 1 .4[) 
with 1 < q < 2. Let F &e the parameter choice according to Morozov's discrepancy 
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principle (14. ip . If the J -minimizing solution of Ku = g satisfies the source con- 
dition (|2.2I) with a source element p' , then the following convergence rates hold for k 
sufficiently large: 



\Kr(8 k ,g k )(9k) - u f \\ eq =0(y/S k ) 



Proof. We apply inequality (|6.1[) for v = 'Ti-T(s k ,g k ){9k) and u — and obtain 

D T P1 ( n r(8 k ,g k )(gk),U f ) > C q \\Kr( Sk ,g h )(9k) - u^lJ, , 



for k sufficiently large. This and Theorem I4.3I imply the first assertion. 

In order to show the estimate for the subgradients, note that (see, e,g., @, Lemma 
1.4.10]) 

Dj.fa,Zl)>C r \\b-Z 1 \\ r t r 

for any £i,£2 G £ r for some positive constant c r depending on r > 2. Consequently, it 
follows from Remark 13.71 that 



K * n r(S k ,g k )(9k) 



K*p^ 



0{S k 



and thus completes the proof. □ 

Now we turn to the case of sparse regularization for q = 1. Here, one can derive 
improved convergence rates in case the solution w does not only fulfill the source 
condition but also is indeed sparse. To be more precise, we define for a given set 
I C N the projection Pj by 



(P/u)a 



uk, k e I 
0, k 4 I. 



and require the following 
Assumption 2. 

i) The solutions it' of (jl.lj) satisfy the source condition (|2.2j) with source element 

ii) For K*p^ = £ and I = {k | = 1}, one has that the quantity 9 = sup{|£fc| | k 4_ 

1} is strictly smaller than one. 
Hi) The operator KPj : £ — > H is injective in the sense that Piu 7^ PjV implies 
KP IU ^ KPiV. 

We start with the following lemma which can be traced back to [l3[ (see also 
II)- 

Lemma 6.4. Assume that Assumption^ is satisfied. Then, there exist constants 
/3i > P2 > such that 



J{u) - J{v)) > PiJ{u-v)) - p 2 \\K(u-u^)\ 
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Proof. Due to Assumption [^][TTTJ) the operator KPj is injective and hence, there 
exists c such that \\K Piu\\ > c ||Pjtt|Li for all u £ I 1 . Now we estimate 

J{u— u f ) = \\u - u^\\ el = \\Pi(u - u t )|| ( , 1 + llP/cu^i 
< i ||jfP 7 («-ttt)|| + \\P iaU \\ el 
<\\\K(u~^)\\ + (\\K\\ + l)\\P lC u\\ el . 

Since u\ = for k $ I, Assumption OjJ and[n]) implies 

\\p lC u\\ el = j2\u k \ 



1 



1 - 1 
1 



< - 

"1-6 

Combining both estimates gives 



(j(u)-J(«+) -(£,«-«+)) 
(■/(if)- J(u + ) + 11^11 H^u- M f )||)- 



J(u _ u t) < (i + || p t || m±i) \\ K{u ttt) || + n±i(j( U) - j(„t)) 



which yields the assertion with 

1 - 6 



t| 



~ ||X|| + 1' (||^||+ l)c 

□ 

Remark 6.5. We remark on Assumption^ Statement [n]) is related to the notion 
of "strict sparsity pattern" in Q. To get a practically relevant condition, one may 
replace this with the assumption that the range of K* is contained in some £ p with 
p < oo (since in this case the sequence £ has to tend to zero). This also implies 
that I is finite. Alternatively one may also work with K : I 2 — > H (which implies 
K*:H->£ 2 ). 

Assumption UTT]) is a restricted injectivity condition. Since one needs to know the 
set / to verify this in advance, one often uses the "finite basis injectivity property" 
(FBI property) from [3j, LL8| which states that KPj is injective for all finite sets /. 
This condition can be checked in advance and hence, it seems more practical. 

Now we treat the case of noisy data and show that the application of Morozov's 
discrepancy principle leads to optimal convergence rates. 

Theorem 6.6. Let v) be a J -minimizing solution of Ku — g and assume that 
r is the parameter choice according to Morozov's discrepancy principle (|4.ip . Then, 
one has 

||ftr(«*, ff *)fl* - ut |U = 0(S k ). 
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Proof. We estimate the symmetric distance from below using Lemma l6.4l To this 
end, set u k = TZ-r(S k ,g k )(g k ) and observe that 

Dy m (u fc ,u t ) > Dj(u k ,iJ) 

= J{u k )-J{vt) + (K*p\u k -v)) 

> PiJ(uk - u f ) - f3 2 \\Kuk - g\\ - (p\Ku k - g) . 

Rearranging and using the Cauchy-Schwartz inequality leads to 

PiJ(u k - «t) < Dj m (u k ,v)) + (ft + ||pt||) \\Ku k - g\\ . 



From the definition of Morozov's discrepancy principle (14.1[) and Theorem 14.31 we 
finally conclude the proof. □ 

6.2. Convergence rate for n — >■ oo in the noisefree case. Another conse- 
quence of our analysis of the ALM is that we can prove convergence rates of the ALM 
iteration with noisefree data which are superior to previous results. 

Proposition 6.7. Let J be according to (|1.4[) with q = 1, u' be a J -minimizing 
solution of Ku = g and po — 0. Then there exists a constant C > such that the 
iterates u n of the ALM fulfill 

n +n c 
\\u n - U< L, < 



t, 



Proof. Since K*p n £ dJ(u n ), one has 



J{u n ) - J(u t ) < - (L<*p n ,u^ - u n ) 
= - {Pn,g - Ku n) 

< \\pn\\ h - Ku n \\ 



Now we use Lemma 16.41 to obtain 



(3i J{u n - v)) < J{u n ) - J( u t) + (5 2 \\Ku n - g\\ 
< (\\ Pn \\ + f3 2 )\\Ku n -g\\. 

Theorem 13.41 (with 6 = 0) gives 

(7 + /32)||p t || 



J(u n - u t ) < 



which proves the assertion. □ 

This proposition shows that the ALM can calculate approximate solutions to the 
so-called Basis Pursuit problem Q of finding minimal .^-norm solutions of underde- 
termined linear systems and also gives an estimate on the speed of convergence of the 
objective value. 

6.3. Implications for Compressed Sensing. Finally we remark on the rela- 
tion of our results to the theory of compressed sensing: Linear convergence rates for 
the variational regularization with f 1 -norm has been shown in [l3|, EH under a source 
condition and some assumptions on the operator K. A similar result has been proven 
(see [7]) in the finite dimensional setting of compressed sensing, by using the restricted 
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isometry property condition. In the latter setting, [14l es tablished the following con- 
nection between the above mentioned conditions - see [14[ part of Proposition 5.3 and 
Theorem 4.7] : 

Proposition 6.8. Assume that Ku^ — g. Assume that K satisfies the s- 
restricted isometry property and let u* be an s -sparse solution of the equation. Then 
satisfies the source condition and KPj is injective, with I given by Lemma \6.4\ 
Based on this result and on the ones in this section, one can immediately state 
the following: 

Proposition 6.9. Assume that K satisfies the s-restricted isometry property 
and let be an s-sparse solution of the equation. Then linear convergence rates hold 
for Bregman iterations in the noisy-free case and in the noisy data case when the 
discrepancy principle is employed. 

7. Conclusion. In this work we showed that Morozov's discrepancy principle 
(|4.1[) applied to the Augmented Lagrangian Method (ALM)Q]leads to a regularization 
method for linear inverse problems Ku = g. This gives a theoretical justification for 
the observation that the discrepancy principle provides useful results in practical 
situations. 

We used a dual characterization of the ALM in order to derive explicit error 
bounds for the Bregman distance between the iterates and a true J-minimizing solu- 
tion w of Ku = g, iiw satisfies the source condition 

XV g 9J(w f ) 

for a source element pK In this case, also error bounds for the Bregman distance 
(with respect to J*) between the dual iterates in the ALM and were obtained. 
We also showed that a sufficient condition for the source condition to hold is the 
existence of finite accumulation points in the sequence of stopping indices chosen by 
the discrepancy principle. 

We applied our general results to particular situations which have a special appeal 
for problems arising in imaging. 

Firstly, we considered the case of total variation regularization where we were 
able to show that the ALM converges strictly in BV(f2) and to establish convergence 
rates with respect to an equivalent metric. 

Secondly, we studied sparse regularization on f 2 , more precisely when J coincides 
with the ^ 9 -norm (q £ [1,2]). Aside to v^-rates in the £ 9 -norm for q > 1, we were 
able to prove linear convergence rates for the particular interesting case of I 1 (under 
suitable regularity conditions on u^). The sequence of dual iterates in the ALM in 
the latter case carries important information on the support of the solution. The 
conjugate function J* of the ^ 1 -norm, however, degenerates to an indicator function. 
As a consequence, the general estimates for the dual variables do not reveal much 
insight in their convergence behavior. It is still an open issue whether one can obtain 
more relevant estimates for the dual variables. 
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